Skip to main content

Hallucination Policies

Overview

Hallucination policies detect model hallucinations in real-time. DynamoGuard currently supports the following hallucination metrics:

  • Summarization Consistency: Measures the logical consistency between an input text and model generated summary. This metric is computed using an Natural Language Inference (NLI model) and provides an entailment probability, describing the degree to which the summary logically implies the input text.
  • Retrieval Relevance (RAG models only): Evaluates the relevance of the retrieved context to the user input. This metric uses an LLM for scoring and provides a binary output.
  • Response Faithfulness (RAG models only): Evaluates the faithfulness of the model response to the retrieved context. This metric uses an LLM for scoring and provides a binary output.
  • Response Relevance (RAG models only): Evaluates the relevance of the model response to the user input. This metric uses an LLM for scoring and provides a binary output.

Hallucination Policy Actions

DynamoGuard currently supports the following actions for hallucination policies.

  • Flag: allow user inputs and model outputs containing hallucinations, but flag input or output in moderator view
  • Block: block user inputs or model outputs containing hallucinations